在本文中,我们呈现SSDNet,这是一个新的时间序列预测的深层学习方法。SSDNet将变压器架构与状态空间模型相结合,提供概率和可解释的预测,包括趋势和季节性成分以及前一步对预测很重要。变压器架构用于学习时间模式并直接有效地估计状态空间模型的参数,而无需对卡尔曼滤波器的需要。我们全面评估了SSDNET在五个数据集上的性能,显示SSDNet是一种有效的方法,可在准确性和速度,优于最先进的深度学习和统计方法方面是一种有效的方法,能够提供有意义的趋势和季节性组件。
translated by 谷歌翻译
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that - across 6 architectures trained with 4 algorithms on massive datasets - they exhibit little compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark CREPE which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of three test datasets. The three test sets are designed to test models trained on three of the popular training datasets: CC-12M, YFCC-15M, and LAION-400M. They contain 385K, 385K, and 373K image-text pairs and 237K, 210K, and 178K hard negative captions. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 246K hard negative captions with atomic, swapping, and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 8%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.
translated by 谷歌翻译
Vision models often fail systematically on groups of data that share common semantic characteristics (e.g., rare objects or unusual scenes), but identifying these failure modes is a challenge. We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. Given a natural language description of a coherent group, AdaVision retrieves relevant images from LAION-5B with CLIP. The user then labels a small amount of data for model correctness, which is used in successive retrieval rounds to hill-climb towards high-error regions, refining the group definition. Once a group is saturated, AdaVision uses GPT-3 to suggest new group descriptions for the user to explore. We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models. These user-discovered groups have failure rates 2-3x higher than those surfaced by automatic error clustering methods. Finally, finetuning on examples found with AdaVision fixes the discovered bugs when evaluated on unseen examples, without degrading in-distribution accuracy, and while also improving performance on out-of-distribution datasets.
translated by 谷歌翻译
视力障碍者的日常运动有重大问题。因此,我们以前的一些工作涉及计算机愿景来开发援助系统,以指导在关键情况下视力障碍。其中一些情况包括在室内和室外环境中的道路交叉路口和楼梯上的人行横道。本文为在此类关键情况下基于计算机视觉障碍的人提供了一个评估框架。提出的框架包括用于标记和存储指导方向的参考人类决策的接口,并将其与基于计算机视觉的决策进行比较。由于该研究领域中的严格评估方法并未明确定义,并且由于信息转移到视障人士的细节,因此提出了针对特定简化指导指令的评估标准。
translated by 谷歌翻译
医疗图像分割通常需要在单个图像上分割多个椭圆对象。这包括除其他任务外,还分割了诸如轴向CTA切片的主动脉之类的容器。在本文中,我们提出了一种一般方法,用于改善这些任务中神经网络的语义分割性能,并验证我们在主动脉分割任务中的方法。我们使用两个神经网络的级联反应,其中一个基于U-NET体系结构执行粗糙的分割,另一个对输入的极性图像转换执行了最终分割。粗糙分割的连接组件分析用于构建极性变换,并且使用磁滞阈值融合了对同一图像的多个转换的预测。我们表明,这种方法可以改善主动脉分割性能,而无需复杂的神经网络体系结构。此外,我们表明我们的方法可以提高稳健性和像素级的回忆,同时根据最新的状态实现细分性能。
translated by 谷歌翻译
部署在野外的机器学习系统通常在源分布上培训,但部署在不同的目标分布上。未标记的数据可以是用于缓解这些分布班次的强大的利用点,因为它通常比标记数据更具可用。然而,未标记数据的现有分配转换基准不反映现实世界应用中出现的方案的广度。在这项工作中,我们介绍了Wilds 2.0更新,该更新在分发转移的野外基准中扩展了10个数据集中的8个,以包括将在部署中逼真获得的策划未标记数据。为了保持一致性,标记的培训,验证和测试集以及评估度量与原始野外基准中的标记与评估度量完全相同。这些数据集涵盖了广泛的应用程序(从组织学到野生动物保护),任务(分类,回归和检测)和方式(照片,卫星图像,显微镜载玻片,文本,分子图)。我们系统地基准测试最先进的方法,可以利用未标记的数据,包括域不变,自我培训和自我监督方法,并表明他们在野外的成功2.0是有限的。为了方便方法开发和评估,我们提供了一个自动化数据加载的开源包,并包含本文中使用的所有模型架构和方法。代码和排行榜可在https://wilds.stanford.edu获得。
translated by 谷歌翻译
术语是特定于域的概念的语言意义者。自动识别在自由文本中的多字术语(MWT)是一个序列标记问题,其通常使用监督机器学习方法寻址。他们对培训数据的手动注释的需求使得跨域难以侵入这些方法。另一方面,Flexiterm是一种完全无监督的方法,用于从域特定的语料库中识别MWT识别。最初在Java中实施作为概念证明,它没有很好地扩展,从而在大数据的背景下提供很少的实用价值。在本文中,我们描述了在Python中的重新实现,并比较了这两种实现的性能。结果表明,效率方面的重大改进,允许FlexitiS从概念证明转变为生产级申请。
translated by 谷歌翻译
Distribution shifts-where the training distribution differs from the test distribution-can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present Wilds, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.
translated by 谷歌翻译